Performance Effects of a Cache Miss Handling Architecture in a Multi-core Processor
نویسندگان
چکیده
Multi-core processors, also called Chip multiprocessors (CMPs), have recently been proposed to counter several of the problems associated with modern superscalar microprocessors: limited instruction level parallelism (ILP), high power consumption and large design complexity. However, the performance gap between a processor core and main memory is large and growing. Consequently, multi-core architectures must invest in techniques to hide the large memory latency. One way of doing this is to use non-blocking or lockup-free caches. The key idea is that a cache can continue to service requests while one or more misses are being processed at a lower memory hierarchy level. This technique was first proposed by Kroft [16]. The main contribution of this paper is the observation that a non-blocking cache must fulfill two functions. Firstly, it should provide sufficient miss parallelism to speed up the applications. Secondly, the number of parallel misses should not be so large that it creates congestion in the on-chip interconnect or off-chip memory bus. While the first function is well known, the other is a result of the possibility for contention when multiple processors are placed on a single chip. A compromise miss handling architecture (MHA) evaluated in this work which handles 16 parallel misses in the L1 cache, has an average speed-up of 47 % compared to a blocking cache and has a hardware cost of 9 % of the L1 cache area.
منابع مشابه
Performance of Cache Memory Subsystems for Multicore Architectures
Advancements in multi-core have created interest among many research groups in finding out ways to harness the true power of processor cores. Recent research suggests that on-board component such as cache memory plays a crucial role in deciding the performance of multi-core systems. In this paper, performance of cache memory is evaluated through the parameters such as cache access time, miss ra...
متن کاملA High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors
Chip Multiprocessors (CMPs) mainly base their performance gains on exploiting thread-level parallelism. Consequently, powerful memory systems are needed to support an increasing number of concurrent threads. Conventional CMP memory systems do not account for thread interference which can result in reduced overall system performance. Therefore, conventional high bandwidth Miss Handling Architect...
متن کاملMeasuring Performance Degradation in Multi-core Processors due to Shared resources
The effect of resource sharing in multicore processors can lead to many more effects most of which are undesirable. This effect of Cross-core interference is a major performance bottleneck. It is important that Chip multiprocessors (CMPs) incorporate methods that minimise this interference. To do so, some accurate measure of Cross Core Interference needs to be devised. This paper studies the re...
متن کاملArchitecture Aware Programming on Multi-Core Systems
In order to improve the processor performance, the response of the industry has been to increase the number of cores on the die. One salient feature of multi-core architectures is that they have a varying degree of sharing of caches at different levels. With the advent of multi-core architectures, we are facing the problem that is new to parallel computing, namely, the management of hierarchica...
متن کاملEffects of Main Memory Latencies on the Performance of Nonblocking Caches
Lockup-free caches in conjunction to non-blocking processor loads have been proposed to hide miss latencies in high performance processors. One problem with current approaches is the increased complexity of the processor and of the cache controller due to non-blocking. In this paper, we introduce a simple mechanism to support non-blocking loads and a lockup-free cache. A modified SPARC architec...
متن کامل